Topic classification and verification modeling for out-of-domain utterance detection

نویسندگان

  • Tatsuya Kawahara
  • Ian R. Lane
  • Tomoko Matsui
  • Satoshi Nakamura
چکیده

The detection and handling of OOD (out-of-domain) user utterances are significant problems for spoken language systems. We approach these problems by applying an OOD detection framework, combining topic classification and in-domain verification. In this paper, we compare the performance of three topic classification modeling schemes: 1-vs-all, where a single classifier is trained for each topic; weighted 1-vs-all; and 1-vs-1, which combines multiple pair-wise classifiers. We also compare the performance of a linear discriminate verifier and nonlinear SVM-based verification. In an OOD detection task as a front-end for speech-to-speech translation, detection performance was comparable for all classification schemes, indicating that the simplest 1-vs-all approach is sufficient for this task. SVM-based in-domain verification was found to provide a significant reduction in detection errors compared to a linear discriminate model. However, when the training and testing scenarios differ, the SVM approach was not robust, while the linear discriminate model remained effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching

An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user’s utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables user...

متن کامل

A model for specification, composition and verification of access control policies and its application to web services

Despite significant advances in the access control domain, requirements of new computational environments like web services still raise new challenges. Lack of appropriate method for specification of access control policies (ACPs), composition, verification and analysis of them have all made the access control in the composition of web services a complicated problem. In this paper, a new indepe...

متن کامل

Domain adaptation based Speaker Recognition on Short Utterances

This paper explores how the inand out-domain probabilistic linear discriminant analysis (PLDA) speaker verification behave when enrolment and verification lengths are reduced. Experiment studies have found that when full-length utterance is used for evaluation, in-domain PLDA approach shows more than 28% improvement in EER and DCF values over out-domain PLDA approach and when short utterances a...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004